Energy saving strategy of cloud data computing based on convolutional neural network and policy gradient algorithm

Cloud Data Computing (CDC) is conducive to precise energy-saving management of user data centers based on the real-time energy consumption monitoring of Information Technology equipment. This work aims to obtain the most suitable energy-saving strategies to achieve safe, intelligent, and visualized energy management. First, the theory of Convolutional Neural Network (CNN) is discussed. Besides, an intelligent energy-saving model based on CNN is designed to ameliorate the variable energy consumption, load, and power consumption of the CDC data center. Then, the core idea of the policy gradient (PG) algorithm is introduced. In addition, a CDC task scheduling model is designed based on the PG algorithm, aiming at the uncertainty and volatility of the CDC scheduling tasks. Finally, the performance of different neural network models in the training process is analyzed from the perspective of total energy consumption and load optimization of the CDC center. At the same time, simulation is performed on the CDC task scheduling model based on the PG algorithm to analyze the task scheduling demand. The results demonstrate that the energy consumption of the CNN algorithm in the CDC energy-saving model is better than that of the Elman algorithm and the ecoCloud algorithm. Besides, the CNN algorithm reduces the number of virtual machine migrations in the CDC energy-saving model by 9.30% compared with the Elman algorithm. The Deep Deterministic Policy Gradient (DDPG) algorithm performs the best in task scheduling of the cloud data center, and the average response time of the DDPG algorithm is 141. In contrast, the Deep Q Network algorithm performs poorly. This paper proves that Deep Reinforcement Learning (DRL) and neural networks can reduce the energy consumption of CDC and improve the completion time of CDC tasks, offering a research reference for CDC resource scheduling.


Introduction
With the fast progress of Information Technology (IT) and Cloud Technology, data centers are the basis for supporting business development and the only way for enterprise development [1]. As the social economy develops rapidly, the economy and market are also accelerating. The construction of data centers has received significant attention in market Under the general trend of energy saving, establishing a perfect data center energy monitoring mechanism is a prerequisite for energy planning and energy management. Traditional hardware-based detection generally refers to the direct measurement of power consumption of the system under test through external power measurement devices or customized collectors. This approach is feasible in small-scale data centers, but the biggest drawback is that it does not meet the need for low-cost and easily scalable monitoring. In contrast, softwarebased energy consumption monitoring mechanisms can enable multi-granular, highly scalable monitoring systems cost-effectively. They are well suited for the complex, heterogeneous, and frequently scalable equipment environments in CDC centers. A CDC task scheduling model with a policy gradient algorithm is innovatively proposed to achieve energy-saving estimation and prediction to explore the application and generalization capability of the energy-saving model on increasing data sets. The algorithm reported here has good convergence in the scenario of high heterogeneity and task volatility of CDC clusters. It dramatically improves the command response time ratio of tasks, promotes cluster load balancing, and realizes the computing energy-saving strategy for cloud data centers in terms of time and device energy consumption. Yan et al. (2019) conducted simulation tests on traditional energy consumption measurement models and virtualized energy consumption models to solve the problem of high energy consumption in CDC data centers. They found that traditional energy models usually capture server power consumption through external power measurement devices or custom harvesters. This approach is suitable for small, modular, homogeneous data center environments. The virtualized energy consumption model has the advantages of low cost, easy expansion, and more granularity. At the same time, it can realize data collection and prediction at the virtual machine and container level [12]. Tahir et al. (2020) established a virtual integration mechanism in the CDC center, using virtual machines to enable the CDC data computing center to have efficient resource scheduling [13]. Hao et al. (2020) performed task prediction on data centers via K-means and Extreme Learning Machines [14]. Wu et al. (2019) analyzed the energy consumption of servers with different configurations in the data center through the time-series neural network model to find the best low-energy server configuration method [15]. Shen et al. (2022) studied the multi-granularity energy consumption model of the CDC data center, compared the energy consumption models of components and programs, and proposed a model with high accuracy, low cost, good generalization ability, high portability, and robust energy consumption [16]. Ebadi et al. (2019) divided the information and communication facilities of CDC data centers into different energy consumption units. The energy consumption statistics of different facility units suggested that the energy consumption of servers and storage devices was 40% higher than that of network devices, which is about 35% higher than the power consumption of the power supply unit. In addition, the energy consumption of the network equipment and the power supply unit is lower, so the energy consumption of the information and communication facilities in the CDC data center is mainly concentrated in the server and storage equipment [17]. Shah et al. (2022) used the Central Processing Unit (CPU) load index to estimate the power consumption of the entire data center server. The authors estimated the CPU load index for compute-intensive as well as hard disk and memory-intensive. The research results indicated that the energy consumption of the CPU was 58% during the whole load process [18]. Zeng et al. (2021) believed that CPU energy consumption is related to the CPU's working frequency in a study of the influencing factors of CPU energy consumption. When each CPU thread is fully loaded, the higher the CPU's working frequency, the greater the energy consumption [19]. Huang et al. (2021) estimated the energy consumption of the data system CPU based on the CPU utilization in the computational research on energy consumption, where the energy consumption of the CPU is proportional to the CPU utilization [20]. Lin et al. (2020) used an exponential power function to study the energy consumption of popular servers from 2007 to 2010. They found that a functional model with an exponent of 0.75 could express these servers' energy consumption performance [21].

Research on task scheduling in the cloud background
There are also many task scheduling studies on the CDC platform. Chen et al. (2020) optimized task scheduling in the CDC environment to meet users' needs and reduce the task completion time [22]. Khorsand et al. (2020) analyzed the task scheduling optimization model in the CDC environment from the perspective of operators. Energy consumption can be reduced by improving the utilization of cluster resources in data center facilities on the premise of ensuring server load balance [23]. At the same time, task scheduling in the cloud environment can also be realized through algorithms. Ghobaei-Arani et al. (2020) used the Moth Flame Optimization algorithm to assign the optimal set of tasks to the data center nodes of CDC to minimize the total execution time of tasks [24]. Munir (2019) handled the movement of uncertain task clusters in the CDC environment online management method based on Mobile Edge Computing and Reinforcement Learning (RL), thereby achieving optimal task movement management [25]. Nassar (2019) combined RL and the greedy algorithm for crowd-aware task scheduling in mobile social networks. The optimal combination of the algorithms improved the perception efficiency and saved energy consumption [26]. Dong et al. (2020) used the RL-Task Scheduling algorithm to schedule tasks with a priority relationship between dynamic task scheduling and cloud servers. The research showed that the algorithm minimizes the task execution time [27]. Wang (2019) studied the scheduling strategy of highly heterogeneous tasks in the cloud environment. They also improved the action selection strategy to Boltzmann action selection strategy on the Deep-Q-Network (DQN) algorithm, which improved the inquiry ability of the ε-greedy algorithm [28]. Table 1 summarizes the characteristics of domestic and foreign scholars' studies.
To sum up, scholars have studied energy-saving scheduling or task scheduling in the CDC background from the energy consumption model and the energy consumption of the data center CPU. Traditional energy-efficient task scheduling algorithms such as round-robin scheduling and ant colony optimization have been extensively used in many cloud computing systems. Due to the dynamic nature of workloads, most existing online energy-saving algorithms are based on heuristics and rely heavily on historical experience. They can solve the resource scheduling problem effectively. However, once the scheduling target or resource pool changes, the designed heuristics cannot be used due to their static nature, and the designer needs to redesign the scheduling algorithm for the new conditional environment. However, there is no research on the low energy consumption and energy-saving strategies of task scheduling of CDC data centers in the cloud environment. Machine Learning models can predict future load demand and scale resources as needed to provide excellent Quality of Service and user experience. These techniques are valuable for service providers to offer reliable services and maintain market leadership. In addition, neural networks and policy gradient algorithms, a machine learning branch, have evolved rapidly in recent years. This technology uses monitoring and feedback data to continuously optimize and improve policies and obtain optimal decisions through continuous intelligence interaction with the environment. This adaptive approach adjusts the agent's decisions and policies according to the current requirements, workload, and the underlying system's state. Therefore, the resource allocation policies for these tasks are to be intelligent for the modern and changing cloud data energy-saving task requirements and cloud environments. This work combines Deep Reinforcement Learning (DRL) and neural networks to reduce the energy consumption of task scheduling in CDC.

CNN theory
This work designs an intelligent energy-saving model based on CNN according to the characteristics of the CDC center with variable energy consumption, load, and power consumption [29]. Besides, a scheduling model of cloud data computing tasks based on the policy gradient algorithm is designed by the policy gradient algorithm for the uncertainty of CDC scheduling tasks and volatility [30]. CNN is a Feedforward Neural Network with neurons that can respond to a part of the surrounding units within the coverage area. It has been used in computer vision for a long time. Its core is "convolution and pooling" operations. A CNN is a multi-layer perceptron designed for recognizing two-dimensional shapes (such as images), consisting of an input layer, a hidden layer (convolutional layer and pooling layer), and an output layer. The hidden layer can have many layers; each layer consists of one or more two-dimensional planes; each plane consists of multiple independent neurons. A neuron in the convolutional layer of a CNN is only connected to some of its neighbors. Unlike ordinary neural networks, CNNs contain feature extractors consisting of convolutional and pooling layers [31]. Fig 1 presents the CNN structure. Fig 1 shows a CNN model of a computer data center. The feature is input and then reaches the convolution layer. The convolution layer performs convolution through the convolution kernel and the 3×3 data to extract the local features of the computer data center. The pooling layer mainly reduces the computational complexity by reducing the dimension of the output matrix of the previous layer, extracting the practical features of the computer data center. Then, the fully connected layer realizes the connection of data features. Finally, the output result is predicted [32]. The Rectified Liner Uints (ReLU) activation function is used in the convolutional layer, and the Softmax activation function is used in the fully connected layer to classify data features. Eqs (1) and (2) express the activation functions.
In Eqs (1) and (2), the ReLU activation function outputs 0 or a positive number. When the Softmax(x) activation function handles more than two multi-classification problems, the class labels used need to have class membership. The activation functions used by different data features are the same for a particular convolutional layer, but the activation functions used by different convolutional layers can be different. In the process of convolution operation, let the input matrix be x(i+m,j+n) and the convolution kernel be w(m, n). Then, the above convolution process can be expressed as Eq (3).
In Eq (3), Z(i,j) indicates the result of the convolution operation obtained by multiplying the convolution kernel w(m, n) and the original data features. The convolution input is 6×6 a matrix, and the convolution kernel is a 3×3 data matrix. The convolution is performed by moving one data at a time. First, the upper left corner 1 of 6×6 input convolved with the convolution kernel. In other words, the element 4 obtained by multiplying and adding the elements of each position is the output matrix Z(i,j). By analogy, the final output convolution matrix is obtained. Eq (4) describes the calculation process.
In Eq (4), n_in represents the number of input matrices; X k refers to the kth input matrix; W k denotes the kth sub-convolution kernel in the convolution kernel; Z ij represents the value of the corresponding position element of the output matrix corresponding to the convolution operation. The data size after convolution is calculated according to Eq (5).
In Eq (5), n stands for the data size extracted in the computer; p signifies the number of padding data columns at the edge of the original data; f represents the size of the convolution kernel of the filter; s stands for the moving step size of the filter over the data features. This completes the convolution operation. Eq (6) describes the calculation output of the pooling operation in the CNN.
In Eq (6), pool represents the process of reducing the size of input data by pooling area k and pooling criteria; a l−1 is an input tensor obtained by edge-filling the input data matrix by convolution. The fully connected layer calculates the output according to Eq (7).
In Eq (7), l signifies the fully connected layer; b l refers to the threshold of the fully connected layer; σ stands for the activation function. Usually, Sigmoid and tanh are taken as the activation function. After several fully connected layers, the last layer is the output layer activated by Softmax. Eq (8) indicates the calculation of the output layer.
Assume that α is the input gradient iteration parameter step size, and � represents the maximum number of iterations and the stop iteration threshold. The output δ i,l of the fully connected layer is obtained by calculating the loss function. Eqs (9) and (10) express the calculation in the fully connected layer after updating W l and b l .

Core idea of the PG algorithm
The PG algorithm mainly starts from the agent's policy for optimization. The strategy refers to the selection method of actions in a given state. The input of the neural network used in the PG algorithm is the corresponding current state of the agent. The output is the corresponding action (discrete space output probability of taking different actions and probability distribution of continuous space output). The PG algorithm models the policy function and then uses gradient descent to update the parameters of the network. However, there is no actual loss function in RL. The purpose of the PG algorithm is to maximize the expected value of the cumulative reward. Therefore, the expected value of the cumulative reward is used as the loss function, which is calculated through the Gradient Ascent algorithm [33]. According to this idea, the core idea of the PG algorithm can be expressed by Eq (11).
In Eq (11), π represents the strategy, and r denotes the return value obtained at the moment t. The return value of the whole process is added to obtain the cumulative return value from the beginning to the end of the trajectory. The expectation E π of cumulative returns can be simply understood as taking the average of all possible processes. s t represents the state of the Neural Network obtained by the agent at the moment t. a t indicates the action of the agent at the moment t. A Parameterized Neural Network is used to express the strategy π θ . Then, the maximum expected return is calculated via Eq (12).
In Eq (12), τ represents the complete path from start to finish. The gradient ascent algorithm is used to find the maximum value, as presented in Eq (13).
In Eq (13), θ � signifies the maximization strategy objective, α stands for the optimal parameter for maximizing the payoff function, and θ represents the strategic objective. First, the gradient of the objective function is calculated, as shown in Eq (14).
The log derivative technique is used in the above derivation. The derivative of log x with respect to x is 1 x . Eq (14) can be derived from the derivation rule of the composite function. Furthermore, Eq (14) is decomposed. π θ (τ) is the strategy adopted by τ in the complete path from the beginning to the end. π θ (τ) is introduced to simplify rJ(π θ ) as Eq (15).
So far, the gradient expression of the objective function has been obtained. However, the expected value cannot be calculated in the actual application process. This value can only be approximated by multiple sampling through the law of large numbers, as presented in Eq (16).
In Eq (16), N represents sampling N times of different τ, and t = 1~T represents the entire process of accumulating reward values from the beginning to the end. If N = 1, the gradient is updated every time a complete path is sampled. An improved Deep Deterministic Policy Gradient (DDPG) algorithm is used to realize the CDC task scheduling algorithm, aiming at the problems of high task scheduling cluster heterogeneity, significant cloud task volatility, and slow convergence speed of DRL algorithms [34]. Fig 2 reveals the implementation process of the DDPG algorithm.
The DDPG algorithm in Fig 2 is based on the Actor-Critic method. In terms of action output, the network fitting strategy function is used to directly output actions, which can cope with the output of continuous actions. The Critic in DDPG also outputs the Q(s, a) value, but it is only the value corresponding to some sampling experience in the Reply buffer; no maximum search is required. Q(s, a) is only used to train the Actor Policy Network. The final action is directly output by the network. There are also advantages of the Actor-Critic class method itself. The characteristics of the method can be combined based on the reward value and the strategy to directly give the best strategy, evaluate the candidate strategy through the critic, and constantly modify the Actor's strategy (s, a) [35]. Eq 17 indicates the action expectation value of the agent in the Critic network.
Qðs t ; a t jy Q Þ ¼ E½rðs t ; a t Þ þ gQðs tþ1 ; a tþ1 jy Q Þ ð17Þ In Eq (17), s t and a t are the state and action of the Critic network, respectively; s t+1 and a t+1 are the state and action of the Critic network at the next moment; θ Q represents the parameter of the Critic network, mainly fitting values of s t and a t ; γ refers to the network parameter.
Then, the action execution strategy of the Actor network in the s t state can be written as Eq (18).
Qðs t ; uðs t Þjy Q Þ ¼ E½rðs t ; uðs t ÞÞ þ gQðs tþ1 ; uðs tþ1 Þjy Q Þ ð18Þ As can be seen from Eq (17), the action execution policy of the Actor network is the action u(s t ) executed by the action policy u of the agent in the Critic network.    The terminal device provides an AC power supply plus a high voltage DC for the power supply, and the high voltage DC is directly embedded into the CDC device by hot swap [36]. In CDC data centers, the scale of CDC equipment continues to expand, and the overall energy consumption also increases. Therefore, the autonomous energy-saving management technology of CDC data centers should be optimized to control and reduce energy consumption. The traditional data center management methods and strategies are determined during deployment and operated following fixed patterns and processes. If these predefined management methods and strategies need to be adjusted, system administrators need to understand these methods and reconfigure the policy. System administrators need to understand and then reconfigure these predefined management methods and policies when needed. At the same time, data centers operating in cloud data centers usually reserve a considerable proportion of resource redundancy to meet the demand of peak load. Nevertheless, the actual load is mainly at a low level. In this case, many hardware devices do not provide effectual performance output while maintaining a high energy consumption.

Establishment of an CDC energy-saving model
The CNN uses the load and power consumption data in the system as the input. The network automatically generates an energy-saving strategy that conforms to the current system operating state and adjusts each component's operating mode. Under the premise of ensuring system stability and application performance requirements, the load of the entire monitored system is distributed more intensively to achieve a higher degree of energy consumption reduction. Fig 4 displays the CDC energy-saving workflow of CNNs.
Standardization and preprocessing of the original data (including data features and power consumption) are required before training a CNN. Then, the CNN model is trained to verify the trained model and check whether the basic error requirements are met. Finally, the whole model is applied to the cloud environment to independently complete the real-time data center

Establishment of the CDC task scheduling model
When establishing a corresponding CDC energy-saving platform, it is necessary to allocate and manage the cloud data center's relevant resources according to users' needs to enhance the utilization rate of resources. The purpose of CDC task scheduling is to ensure that the task information submitted by users can be optimally scheduled to reach the maximum limit of the data processing capacity of the cloud data center. The main goals of CDC task scheduling include optimal span, service quality, economic principles, and load balancing. Fig 5 shows the task scheduling principle of the cloud data center.
The user submits the task to the cloud data center scheduling server (CDCSS). The CDCSS assigns the task to the appropriate virtual machine according to the task scheduling algorithm. A task queue is set in the CDCSS to store tasks submitted by users. The length of the task queue is unlimited and is used to store all different types of tasks submitted by different users. The three modules in the task scheduling algorithm, the status monitor, and the playback memory unit in the CDCSS share data. Fig 6 illustrates the cloud task scheduling strategy of the improved DDPG algorithm designed according to the PG algorithm. Table 2 lists the experimental environment and configuration of this paper. According to the experimental environment configuration in Table 2, the Lawrence Livermore National Laboratory dataset is used to test the CDC energy-saving model and CDCSS. The Lawrence Livermore National Laboratory dataset includes detailed load data for large-scale CDC systems worldwide. The data link is Lawrence Livermore National Laboratory (llnl.gov). The data set link used by the article is: https://www.cs.huji.ac.il/labs/parallel/workload.

Experimental results of the CDC energy-saving model
The CDC energy-saving model is tested and simulated in different scenarios. In Scenario 1, the requirements of the Plant Management System and the Virtual Mimicking System are set to 100, respectively; In Scenario 2, the requirements of the Plant Management System and the Virtual Mimicking System are set to 100 and 150, respectively; In Scenario 3, the requirements of the Plant Management System and the Virtual Mimicking System are set to 150 and 100, respectively. At the same time, the Elman Neural Network algorithm, ecoCloud algorithm, and CNN algorithm are applied to analyze the energy consumption and the number of virtual machine migrations in the cloud data center. Fig 7 presents the experimental results of the CDC energy-saving model with different algorithms.
As can be seen from Fig 7(A), CNN has the lowest energy consumption in the CDC energysaving model compared with Elman and ecoCloud algorithms. In Scenario 1, the energy consumption of CNN is 30.13% lower than the Elman algorithm and 45.89% lower than the  ecoCloud algorithm. In Scenario 2, the energy consumption of CNN is 25.76% lower than the Elman algorithm and 40.80% lower than the ecoCloud algorithm. In Scenario 3, the energy consumption of CNN is 29.02% lower than the Elman algorithm and 37.48% lower than the ecoCloud algorithm. It is found that the energy consumption of the CNN algorithm in the CDC energy-saving model is better than that of the Elman algorithm and the ecoCloud algorithm. In Fig 7(B), the ecoCloud algorithm has the lowest number of virtual machine migrations in the three scenarios, with an average of 484. The CNN and Elman algorithms have similar virtual machine migration times in the three scenarios; the average migrations are 1,097 times and 1,199 times. However, according to the CDC principle of ecoCloud, most of the server energy consumption of the ecoCloud algorithm is the resource consumption when the server is idle. Therefore, considering the execution of the CNN algorithm and Elman algorithm, it can be found that compared with the Elman algorithm, the CNN algorithm reduces the number of VM migrations in CDC energy-saving model by 9.30%.

Experiment results of CDC task scheduling strategy
The command response time of the Random, Earliest algorithm, Round-Robin (RR), DQN, and DDPG algorithms used in the cloud data center are compared in different scenarios. Combined with the response time of CDC task scheduling in three scenarios in Fig 8, it is found that the DDPG algorithm has the highest instruction response time in different scenarios. The average response time of the DDPG algorithm in the three scenarios is 141. With the difficulty of setting the model scene and the change of the submission times, the average response time gap between the DDPG algorithm and RR algorithm gradually becomes smaller. The average response time of the Random algorithm is 111, that of the Earliest algorithm is 129, that of the RR algorithm is 137, and that of the DQN algorithm is 126. Therefore, the DDPG algorithm performs the best in the task scheduling strategy of the cloud data center, and the DRL algorithm DQN algorithm performs poorly.

Discussion
The work shows that the CNN algorithm outperforms the Elman algorithm and ecoCloud algorithm for energy consumption in cloud data energy-saving models. Qu et al. (2022) established an energy consumption model based on the Elman algorithm in studying energy consumption in cloud computing environments and energy-saving scheduling tasks. They found that the accuracy of the energy consumption model based on the Elman algorithm is higher than that of the multiple linear regression model [37]. They proved the correctness of this paper to study the comparison method between the CNN algorithm and the Elman algorithm in CDC energy consumption. The command response time of the DDPG algorithm is the highest under different scenarios. The average response time of the DDPG algorithm under three scenarios is 141, with a change in the difficulty of setting up the model scenarios. The average response time gap between the DDPG algorithm and the RR algorithm gradually becomes smaller with the change in the number of test submissions for algorithms with highsetting scenarios. Shi et al. (2020) proposed efficient cloud task scheduling algorithms, the DQN algorithm and the DDPN algorithm, respectively, based on DRL. The results showed that the DDPN algorithm converged faster and outperformed the traditional algorithms in terms of optimization metrics such as command response time ratio and load balancing [38]. This study proves the feasibility of the results of this work.

Conclusion
This work utilizes a CNN to adjust each hardware resource module's power and operation mode in the cloud data center. In addition, the energy consumption requirements of a single hardware resource module are reduced as much as possible to ensure system stability and application performance. Under the premise of performance and application performance requirements, the CDC monitoring system's load is distributed more centrally so that some CDC devices can enter standby, power off, or other equivalent states to reduce energy consumption. At the same time, the improved PG algorithm is used to analyze the time response of the task scheduling strategy of the cloud data center. It reflects the energy-saving effect of the task scheduling time of the cloud data center from the side, dramatically saving the data computing response time. Relevant experiments are carried out using the Lawrence Livermore National Laboratory data set. The experiments suggest that the energy consumption of the CNN algorithm in the CDC energy-saving model is better than that of the Elman algorithm and the ecoCloud algorithm. The CNN and Elman algorithms have similar virtual machine migration times in the three scenarios, with an average of 1,097 and 1,199 times, respectively. Compared with the Elman algorithm, the CNN algorithm reduces the number of virtual machine migrations in the CDC energy-saving model by 9.30%. The command response time of the DDPG algorithm in different scenarios is the highest. The average response time of the DDPG algorithm in the three scenarios is 141. Furthermore, it is found that the response time of the algorithm increases with the number of test submissions with the difficulty of the model scenario setup. The difference between the average response time of the DDPG algorithm and the RR algorithm gradually becomes smaller.
This paper studies the neural network algorithms in the scenario of high heterogeneity of CDC clusters and significant task volatility. The algorithm proposed here has good convergence, greatly improves the command response time ratio of tasks, and promotes cluster load balancing. It realizes the computing energy-saving strategy of the cloud data center. However, there are still some shortcomings. This work highlighted the practicability of the CDC energysaving model mainly from the energy consumption of computing equipment in the cloud data center but did not consider other energy-saving factors. Future research will comprehensively consider the energy-saving factors of cloud data center computing for a comprehensive CDC energy-saving strategy.